feat(langchain): mark LangChain root observations in metadata#1604
feat(langchain): mark LangChain root observations in metadata#1604
Conversation
|
@claude review |
There was a problem hiding this comment.
LGTM — clean, well-scoped change with focused tests.
Extended reasoning...
Overview
Adds a _get_langchain_observation_metadata helper to CallbackHandler.py that wraps the existing __join_tags_and_metadata call and injects is_langchain_root: True when parent_run_id is None. The helper is applied consistently at all four callback entry points: on_chain_start, on_tool_start, on_retriever_start, and __on_llm_action. Two unit tests using a lightweight fake-client pattern are added to tests/test_langchain.py.
Security risks
None. The change only writes a boolean flag into observation metadata; it does not touch auth, credentials, or any trust boundary.
Level of scrutiny
Low. This is an additive, metadata-only change that cannot break existing behaviour: consumers who do not look for is_langchain_root are unaffected, and child observations are explicitly left unchanged. The helper correctly uses .copy() to avoid mutating the dict returned by __join_tags_and_metadata.
Other factors
The two new tests directly exercise the flag for both the chain and LLM paths without requiring a live API, making them fast and reliable. The implementation follows the existing codebase style throughout.
| assert langchain_generation_span.input != "" | ||
| assert langchain_generation_span.output is not None | ||
| assert langchain_generation_span.output != "" | ||
| assert langchain_generation_span.metadata["is_langchain_root"] is True |
There was a problem hiding this comment.
🔴 The test test_callback_generated_from_trace_chat incorrectly asserts is_langchain_root on the GENERATION span rather than the CHAIN wrapper that is the actual LangChain root. When ChatOpenAI.invoke() is called, LangChain fires on_chain_start with parent_run_id=None first (creating a CHAIN wrapper), then on_chat_model_start with a non-None parent_run_id, so the flag lands on the CHAIN, not the GENERATION. The test should filter for the root CHAIN span (as test_callback_generated_from_lcel_chain correctly does) rather than asserting is_langchain_root on the GENERATION span.
Extended reasoning...
What the bug is and how it manifests
Line 70 asserts langchain_generation_span.metadata["is_langchain_root"] is True on the ChatOpenAI GENERATION span. The _get_langchain_observation_metadata helper only sets is_langchain_root=True when parent_run_id is None. When ChatOpenAI.invoke() is called, LangChain fires two callbacks: on_chain_start (with parent_run_id=None, creating a CHAIN wrapper as the LangChain root) and then on_chat_model_start (with parent_run_id=<chain_run_id>, a non-None value). So the CHAIN wrapper gets is_langchain_root=True, while the GENERATION does not.
The specific code path that triggers it
In __on_llm_action, the metadata is set via self._get_langchain_observation_metadata(parent_run_id=parent_run_id, ...). When on_chat_model_start fires for a ChatOpenAI.invoke() call, parent_run_id equals the run ID of the previously created CHAIN wrapper (not None). The helper function reaches if parent_run_id is not None: return observation_metadata without setting the is_langchain_root key.
Why existing evidence confirms this, not refutes it
The test itself already asserts len(trace.observations) == 3 (line 54), which means there are exactly 3 observations: (1) the Langfuse parent from start_as_current_observation, (2) a CHAIN wrapper created by on_chain_start, and (3) a GENERATION created by on_chat_model_start. If LangChain only fired on_chat_model_start with parent_run_id=None (as refuters claim), there would be only 2 observations total, contradicting line 54. This count of 3 is also independently confirmed by test_multimodal, which similarly calls model.invoke() directly and asserts len(trace.observations) == 3. Furthermore, git history shows commit 840cf2a explicitly changed the observation count in this test from 2 to 3, documenting the LangChain behavioral change where on_chain_start now fires before on_chat_model_start even for direct model invocations.
Step-by-step proof
chat.invoke(messages, config={"callbacks": [handler]})is called.- LangChain fires
on_chain_start(run_id=UUID_A, parent_run_id=None)→_get_langchain_observation_metadataseesparent_run_id is None, setsis_langchain_root=Trueon the CHAIN observation. - LangChain fires
on_chat_model_start(run_id=UUID_B, parent_run_id=UUID_A)→_get_langchain_observation_metadataseesparent_run_id is not None, returns metadata withoutis_langchain_root. - The test filters for
o.type == "GENERATION" and o.name == "ChatOpenAI", finding observation UUID_B. langchain_generation_span.metadata["is_langchain_root"]is either missing orNone, causing the assertion on line 70 to fail (eitherKeyErrororAssertionError).
How to fix it
Mirror the pattern used in test_callback_generated_from_lcel_chain: filter all observations for those with observation.metadata.get("is_langchain_root") and assert the single result has type == "CHAIN". Remove the incorrect line 70 assertion from test_callback_generated_from_trace_chat and replace it with a check that the CHAIN wrapper (not the GENERATION) carries the flag.
Summary
is_langchain_root: truein callback metadataWhy
Root observations created through the LangChain callback handler were not explicitly marked in observation metadata. That made it harder for downstream consumers to identify the LangChain-created root observation reliably.
Impact
Consumers can now detect root LangChain observations via the
is_langchain_rootmetadata key without changing child observation behavior.Validation
uv run pytest -q tests/test_langchain.py -k "is_langchain_root_metadata"Disclaimer: Experimental PR review
Greptile Summary
This PR adds an
is_langchain_root: Truemetadata flag to all root (top-level, noparent_run_id) LangChain observations, making it easier for downstream consumers to identify the entry-point span of a LangChain-driven trace without inspecting the call hierarchy.Key changes:
_get_langchain_observation_metadatawraps the existing__join_tags_and_metadata, injectingis_langchain_root: Truewhenparent_run_id is None; child observations are unaffected.on_chain_start,on_tool_start,on_retriever_start, and__on_llm_action(covering bothon_llm_startandon_chat_model_start).on_tool_startmerges arbitrary extra kwargs intometaafter the flag is set, meaning a caller who happens to passis_langchain_rootin kwargs could silently overwrite it; the flag should ideally be applied last to be safe.importlib.import_module(...)is called inside the_patch_langchain_clienttest helper rather than as a top-level import, which is inconsistent with the project's import convention.Confidence Score: 5/5
is_langchain_rootis injected only for root observations and never touches child spans. The two new tests exercise the changed paths with a proper fake, and the existing test suite is unchanged. Both open findings are P2: a test-only import style issue and a theoretical (not practically reachable today) flag-overwrite edge case inon_tool_start.Important Files Changed
_get_langchain_observation_metadatahelper that wraps__join_tags_and_metadataand injectsis_langchain_root: Truefor all root observations (noparent_run_id); applied consistently toon_chain_start,on_tool_start,on_retriever_start, and__on_llm_action.importlib.import_moduleinside a helper function instead of a top-level import.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["LangChain callback fires\n(on_chain_start / on_llm_start /\non_tool_start / on_retriever_start)"] --> B{"parent_run_id\nis None?"} B -- "Yes (root run)" --> C["__join_tags_and_metadata\n(tags, metadata)"] C --> D["observation_metadata\n(may be None)"] D --> E["root_metadata = observation_metadata.copy() or {}"] E --> F["root_metadata['is_langchain_root'] = True"] F --> G["start_observation(metadata=root_metadata)"] B -- "No (child run)" --> H["__join_tags_and_metadata\n(tags, metadata)"] H --> I["start_observation(metadata=observation_metadata)"]Comments Outside Diff (1)
langfuse/langchain/CallbackHandler.py, line 719-730 (link)is_langchain_rootcan be silently overwritten by kwargsFor
on_tool_start, after_get_langchain_observation_metadatareturns{"is_langchain_root": True}(root run, no tags/metadata), themeta.update(kwargs)call immediately below merges all non-Nonekwargs intometa. If a caller ever passesis_langchain_rootamong the extra keyword arguments, the flag will be silently overwritten without warning.While this is unlikely in practice today, setting the flag after the kwargs merge would be more resilient:
Alternatively, the
_get_langchain_observation_metadatahelper itself could be the canonical place — but theupdatecall should happen first so the flag wins.Reviews (1): Last reviewed commit: "Merge branch 'main' into codex/langchain..." | Re-trigger Greptile
Context used:
Learnt From
langfuse/langfuse-python#1387